Privacy Preserving Spam Filtering

نویسندگان

  • Manas A. Pathak
  • Mehrbod Sharifi
  • Bhiksha Raj
چکیده

Email is a private medium of communication, and the inherent privacy constraints form a major obstacle in developing effective spam filtering methods which require access to a large amount of email data belonging to multiple users. To mitigate this problem, we envision a privacy preserving spam filtering system, where the server is able to train and evaluate a logistic regression based spam classifier on the combined email data of all users without being able to observe any emails using primitives such as homomorphic encryption and randomization. We analyze the protocols for correctness and security, and perform experiments of a prototype system on a large scale spam filtering task. State of the art spam filters often use character n-grams as features which result in large sparse data representation, which is not feasible to be used directly with our training and evaluation protocols. We explore various data independent dimensionality reduction which decrease the running time of the protocol making it feasible to use in practice while achieving high accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Social Networks Privacy-Preserving On Collaborative Tagging and Spam Filter Using Naive Bayes Algorithm

Collaborative tagging is one of the most popular services available in social networks, and it allows user to classify either online or offline resources based on their feedback, deliver in the form of tags. Although tags may not be secret information the wide use of collaborative tagging services increases the risk, thereby seriously compromising user privacy. In this paper, we make a contribu...

متن کامل

Spamdoop: A privacy-preserving Big Data platform for collaborative spam detection

Spam has become the platform of choice used by cyber-criminals to spread malicious payloads such as viruses and trojans. In this paper, we consider the problem of early detection of spam campaigns. Collaborative spam detection techniques can deal with large scale e-mail data contributed by multiple sources; however, they have the well-known problem of requiring disclosure of e-mail content. Dis...

متن کامل

A Mail Client Plugin for Privacy-Preserving Spam Filter Evaluation

We describe a plugin extension to the Thunderbird Mail Client to support standardized evaluation of multiple spam filters on private mail streams. Researchers need not view or handle the subject users’ messages and subject users need not be familiar with spam filter evaluation methodology. All that is required of the user is to install the plugin as a standard extension and to run it on his or ...

متن کامل

Email Spam Detection: a Symbiotic Feature Selection Approach Fostered by Evolutionary Computation

The electronic mail (email) is nowadays an essential communication service being widely used by most Internet users. One of the main problems affecting this service is the proliferation of unsolicited messages (usually denoted by spam) which, despite the efforts made by the research community, still remains as an inherent problem affecting this Internet service. In this perspective, this work p...

متن کامل

Symbiotic filtering for spam email detection

This paper presents a novel spam filtering technique called Symbiotic Filtering (SF) that aggregates distinct local filters from several users to improve the overall performance of spam detection. SF is an hybrid approach combining some features from both Collaborative (CF) and Content-Based Filtering (CBF). It allows for the use of social networks to personalize and tailor the set of filters t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1102.4021  شماره 

صفحات  -

تاریخ انتشار 2011